Skip to content

feat: add experimental Codex session supervisor#147

Closed
ndycode wants to merge 15 commits intomainfrom
git-clean/05-session-supervisor-single
Closed

feat: add experimental Codex session supervisor#147
ndycode wants to merge 15 commits intomainfrom
git-clean/05-session-supervisor-single

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 20, 2026

Summary

Adds an opt-in Codex CLI session supervisor that keeps interactive sessions on a healthy account, integrates quota-aware rotation into startup, and hardens the storage and quota paths needed to make supervisor handoffs safe.

What Changed

  • Added scripts/codex-supervisor.js and routed scripts/codex.js through it when codexCliSessionSupervisor is enabled.
  • Kept the feature disabled by default behind the experimental codexCliSessionSupervisor setting.
  • Fixed the latest review follow-ups around startup abort handling, monitor-exit propagation, child-env sanitization intent, bounded snapshot probe caching, direct heartbeat-loss coverage, pending-restart preservation after monitor failures, and the full stale-lock TOCTOU regression path.
  • Made storage-path equivalence symlink-aware with async realpath checks so exports do not false-fail when the active path and transaction path resolve to the same file through different prefixes.
  • Added quota-probe and supervisor regressions for abort-before-fallback behavior, monitor-failure plus requested-restart handling, and stale-lock refresh races.

Validation

Passed:

  • npx vitest run test/quota-probe.test.ts
  • npx vitest run test/codex-supervisor.test.ts
  • npx vitest run test/storage.test.ts
  • npm run test:session-supervisor:smoke
  • npm run lint
  • npm run typecheck
  • npm run build

Blocked Baseline

  • npm test still fails on an inherited docs issue already present on origin/main: README.md and docs/README.md both link to missing docs/releases/v1.2.0.md.

Docs Impact

  • No product docs changed in this PR.
  • The only docs-related failure is the inherited missing release-note links already present on origin/main.

Risk and Rollback

  • Risk: medium, because the supervisor touches interactive orchestration and storage locking.
  • Rollback: disable codexCliSessionSupervisor, or revert this PR branch if the experiment needs to be backed out.

Notes

  • This remains the single PR for the supervisor work.
  • The supervisor is still disabled by default.

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this pr adds an opt-in codexCliSessionSupervisor that monitors running codex sessions, probes quota, and transparently rotates to a healthy account via codex resume <sessionId> when a rate-limit or near-exhaustion is detected. it's disabled by default behind an experimental flag and integrates into the existing scripts/codex.js entrypoint. the storage and quota-probe layers are hardened with symlink-aware path comparison, abort-signal propagation, and a safer stale-lock removal path.

key points:

  • scripts/codex-supervisor.js (2418 lines) is the core: lock acquisition, heartbeat-based lease renewal, quota probing with a bounded snapshot cache, session-binding discovery via .jsonl scan, and a retry loop with pre-warming
  • lib/storage.ts replaces a silent branch-2 fallback in exportAccounts with a hard throw on storage-path mismatch, guarded by async realpath-aware path comparison
  • lib/quota-probe.ts adds AbortSignal support with proper listener cleanup and abort-before-fallback guards
  • previous review issues (safeUnlink platform guard, parseNumberEnv max clamp, stale-lock TOCTOU, buildCodexChildEnv env sanitisation, module-level constant baking) appear addressed in this revision
  • two correctness issues remain: (1) the lock file is stranded with empty content if handle.writeFile throws after fs.open("wx") succeeds — subsequent acquisitions see EEXIST, fail to parse the empty payload, and then observe a recent mtime so isSupervisorLockStale returns false, blocking until TTL; (2) normalizeStorageComparisonPath swallows all realpath errors (not just ENOENT), causing a silent fallback to resolvePath under permission errors which can produce false path-mismatch export blocks
  • listJsonlFiles has no file-count limit — a large sessions directory causes unbounded I/O on every polling cycle (refreshed every ~2 s); this is a windows filesystem performance risk
  • vitest coverage is broad but missing a case for the lock-file stranding path and large-directory scan behavior

Confidence Score: 3/5

  • safe to merge with the feature flag off, but the lock-stranding bug and ENOENT-only catch should be resolved before enabling in production
  • the feature is disabled by default which limits blast radius. most previously-flagged issues (safeUnlink platform guard, parseNumberEnv max, TOCTOU, env sanitisation) appear addressed. two new correctness issues remain: empty lock file left stranded after a write failure can block acquisition for the full TTL (concurrency issue on all platforms, worse on windows), and the broad catch {} in normalizeStorageComparisonPath silently degrades symlink detection under permission errors. the unbounded listJsonlFiles scan is a latency risk but not a correctness bug. test coverage is strong overall but the stranded-lock path has no vitest test.
  • scripts/codex-supervisor.js (lock acquisition write-failure cleanup) and lib/storage.ts (normalizeStorageComparisonPath error handling)

Important Files Changed

Filename Overview
scripts/codex-supervisor.js 2418-line new file implementing the session supervisor; has a lock-file stranding bug on write failure and an unbounded session directory scan that can cause I/O-heavy polling loops
lib/storage.ts Adds symlink-aware path comparison for export guard, but normalizeStorageComparisonPath silently swallows EPERM/EACCES (should only catch ENOENT); also adds defensively-correct path-mismatch throw replacing the silent fallback; sequential awaits in areEquivalentStoragePaths are a minor inefficiency
scripts/codex.js Routes interactive sessions through supervisor; supervisorDidStartupSync flag correctly gates the post-session sync, and supervisorDidForward correctly handles the non-supervisor fallback path
lib/quota-probe.ts Adds AbortSignal support with correct abort-forwarding, listener cleanup, and abort-before-fallback guards; abort signal is properly propagated through the fetch controller
test/codex-supervisor.test.ts 2715-line test suite covering rotation, lock acquisition, env stripping, and abort paths; missing coverage for lock-file stranding on writeFile failure and large-session-directory scan performance
test/storage.test.ts Adds path-mismatch export guard tests including Windows path normalization and symlink equivalence; missing a case for EACCES in normalizeStorageComparisonPath

Sequence Diagram

sequenceDiagram
    participant codex.js
    participant supervisor as codex-supervisor.js
    participant runtime as loadSupervisorRuntime
    participant lock as withSupervisorStorageLock
    participant manager as AccountManager
    participant child as codex child process
    participant monitor as monitor loop

    codex.js->>supervisor: runCodexSupervisorIfEnabled()
    supervisor->>runtime: loadSupervisorRuntime()
    runtime-->>supervisor: { loadPluginConfig, fetchCodexQuotaSnapshot, ... }
    supervisor->>lock: ensureLaunchableAccount()
    lock->>manager: loadFromDisk()
    manager-->>lock: accounts
    lock->>runtime: fetchCodexQuotaSnapshot(account)
    runtime-->>lock: snapshot / QuotaProbeUnavailableError
    lock-->>supervisor: { ok, account }
    supervisor->>supervisor: syncBeforeLaunch()
    supervisor->>child: spawnRealCodex(codexBin, args, buildCodexChildEnv())
    supervisor->>monitor: start monitor loop
    loop poll every 300ms
        monitor->>runtime: probeAccountSnapshot(currentAccount)
        runtime-->>monitor: snapshot
        monitor->>monitor: computeQuotaPressure()
        alt pressure.prewarm
            monitor->>supervisor: maybeStartPreparedResumeSelection()
        end
        alt pressure.rotate && idle
            monitor->>child: requestChildRestart(SIGINT→SIGTERM→SIGKILL)
            monitor->>monitor: set requestedRestart
        end
    end
    child-->>supervisor: exit(code)
    supervisor->>lock: markCurrentAccountForRestart()
    supervisor->>lock: commitPreparedSelection() or ensureLaunchableAccount()
    lock->>manager: persistActiveSelection()
    supervisor->>supervisor: syncBeforeLaunch()
    supervisor->>child: spawnRealCodex(resume sessionId, buildCodexChildEnv())
    supervisor-->>codex.js: exitCode
Loading

Comments Outside Diff (2)

  1. scripts/codex-supervisor.js, line 704-717 (link)

    lock file stranded on write failure — blocks future acquisition until TTL expires

    fs.open(lockPath, "wx") creates the file exclusively, but if handle.writeFile then throws (e.g., ENOSPC, EACCES, intermittent I/O), the finally block only closes the handle without removing the empty file. the outer catch re-throws because the writeFile error code is not a transient lock-acquire code (EEXIST / windows EPERM/EBUSY), so the stranded file is never cleaned up.

    on the next acquisition attempt any process will see EEXIST. isSupervisorLockStale will get null from JSON.parse("") → falls through to fs.stat → mtime is very recent → isSupervisorLockStale returns false. the lock is stuck until TTL expires (30 s default), which is the same class of concurrency problem as the stale-lock TOCTOU. this is a windows filesystem safety risk too since NTFS can produce EPERM on short-lived file contention.

    const handle = await fs.open(lockPath, "wx");
    const acquiredAt = Date.now();
    const ownerId = `${process.pid}:${acquiredAt}:${Math.random()
        .toString(36)
        .slice(2)}`;
    try {
        await handle.writeFile(
            `${JSON.stringify(
                createSupervisorLockPayload(ownerId, acquiredAt, ttlMs),
            )}\n`,
            "utf8",
        );
    } catch (writeError) {
        await handle.close().catch(() => {});
        await safeUnlink(lockPath);
        throw writeError;
    }
    await handle.close();

    no vitest coverage exists for this path — a test that mocks handle.writeFile to throw ENOSPC and verifies the lock file is cleaned up and a subsequent acquisition succeeds would close the gap.

  2. lib/storage.ts, line 383-390 (link)

    realpath catch swallows EPERM/EACCES — silent fallback breaks symlink detection under permission errors

    the catch {} is documented as "path does not exist yet" but also silently eats permission errors (EACCES, EPERM). if the parent directory of the storage path is unreadable (common in sandboxed CI or on windows with locked directories), realpath throws EACCES and the function falls back to resolvePath. two paths that are symlinked to the same underlying file then compare as unequal, causing areEquivalentStoragePaths to return false and the exportAccounts path-mismatch guard to throw unnecessarily.

    restrict the catch to only ENOENT to match the stated intent:

    this also ensures unexpected filesystem errors surface instead of producing a silent wrong answer. worth a vitest coverage case for the ENOENT vs EACCES distinction.

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: scripts/codex-supervisor.js
Line: 704-717

Comment:
**lock file stranded on write failure — blocks future acquisition until TTL expires**

`fs.open(lockPath, "wx")` creates the file exclusively, but if `handle.writeFile` then throws (e.g., `ENOSPC`, `EACCES`, intermittent I/O), the `finally` block only closes the handle without removing the empty file. the outer catch re-throws because the `writeFile` error code is not a transient lock-acquire code (`EEXIST` / windows `EPERM`/`EBUSY`), so the stranded file is never cleaned up.

on the next acquisition attempt any process will see `EEXIST`. `isSupervisorLockStale` will get `null` from `JSON.parse("")` → falls through to `fs.stat` → mtime is very recent → `isSupervisorLockStale` returns `false`. the lock is stuck until TTL expires (30 s default), which is the same class of concurrency problem as the stale-lock TOCTOU. this is a windows filesystem safety risk too since NTFS can produce `EPERM` on short-lived file contention.

```js
const handle = await fs.open(lockPath, "wx");
const acquiredAt = Date.now();
const ownerId = `${process.pid}:${acquiredAt}:${Math.random()
    .toString(36)
    .slice(2)}`;
try {
    await handle.writeFile(
        `${JSON.stringify(
            createSupervisorLockPayload(ownerId, acquiredAt, ttlMs),
        )}\n`,
        "utf8",
    );
} catch (writeError) {
    await handle.close().catch(() => {});
    await safeUnlink(lockPath);
    throw writeError;
}
await handle.close();
```

no vitest coverage exists for this path — a test that mocks `handle.writeFile` to throw `ENOSPC` and verifies the lock file is cleaned up and a subsequent acquisition succeeds would close the gap.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/storage.ts
Line: 383-390

Comment:
**`realpath` catch swallows EPERM/EACCES — silent fallback breaks symlink detection under permission errors**

the `catch {}` is documented as "path does not exist yet" but also silently eats permission errors (`EACCES`, `EPERM`). if the parent directory of the storage path is unreadable (common in sandboxed CI or on windows with locked directories), `realpath` throws `EACCES` and the function falls back to `resolvePath`. two paths that are symlinked to the same underlying file then compare as unequal, causing `areEquivalentStoragePaths` to return `false` and the `exportAccounts` path-mismatch guard to throw unnecessarily.

restrict the catch to only `ENOENT` to match the stated intent:

```suggestion
	try {
		resolved = await fs.realpath(resolved);
	} catch (error: unknown) {
		// Fall back to the normalized input when the path does not exist yet.
		if (
			!error ||
			typeof error !== "object" ||
			!("code" in error) ||
			error.code !== "ENOENT"
		) {
			throw error;
		}
	}
```

this also ensures unexpected filesystem errors surface instead of producing a silent wrong answer. worth a vitest coverage case for the `ENOENT` vs `EACCES` distinction.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: scripts/codex-supervisor.js
Line: 1605-1630

Comment:
**`listJsonlFiles` unbounded recursive scan — I/O performance degrades with large session directories**

every call to `waitForSessionBinding` can trigger a full recursive directory walk (refreshed every `listingRefreshMs` ≈ 2 s). if a user accumulates hundreds of old `.jsonl` session files (which codex does not auto-prune), the scan grows unboundedly. each entry then goes through `readSessionBindingEntry` (an `fs.stat` per file) and potentially `extractSessionMeta` (opens each file). on windows, directory enumeration is slower and file handles are held longer, so this is both a windows filesystem safety risk and a general latency risk for interactive rotation.

consider capping the scan to the most-recently-modified N entries (e.g. 256) so the worst-case I/O is bounded:

```js
async function listJsonlFiles(rootDir, maxFiles = 256) {
    // ... existing walk ...
    // after collecting, sort by mtime desc and slice
    return files.sort((a, b) => b.mtimeMs - a.mtimeMs).slice(0, maxFiles);
}
```

no existing vitest test exercises behaviour under a large (100+) session directory; that coverage gap makes it hard to detect regressions here.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/storage.ts
Line: 396-402

Comment:
**sequential `await` in `areEquivalentStoragePaths` — both realpath calls can run concurrently**

the two `normalizeStorageComparisonPath` calls are awaited sequentially. since they operate on independent paths, they can be parallelised:

```suggestion
async function areEquivalentStoragePaths(
	left: string,
	right: string,
): Promise<boolean> {
	const [normalizedLeft, normalizedRight] = await Promise.all([
		normalizeStorageComparisonPath(left),
		normalizeStorageComparisonPath(right),
	]);
	return normalizedLeft === normalizedRight;
}
```

each call may do a `fs.realpath` syscall; running them in parallel halves the await time on every `exportAccounts` call that lands inside an active transaction.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "fix: pin flagged sto..."

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

Warning

Rate limit exceeded

@ndycode has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 25 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2d1b35c0-4b7a-43d9-a8ab-258512595714

📥 Commits

Reviewing files that changed from the base of the PR and between 500314f and 55333f1.

📒 Files selected for processing (13)
  • lib/codex-manager/settings-hub.ts
  • lib/quota-probe.ts
  • lib/storage.ts
  • lib/ui/copy.ts
  • package.json
  • scripts/codex-supervisor.js
  • scripts/codex.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
  • test/plugin-config.test.ts
  • test/quota-probe.test.ts
  • test/settings-hub-utils.test.ts
  • test/storage.test.ts
📝 Walkthrough

Walkthrough

adds a gated session supervisor for the codex cli, integrates it into the startup flow, adds a backend toggle and config getter, adds abort-aware quota probing, refactors storage transaction path handling, and adds comprehensive supervisor and storage tests (scripts/codex-supervisor.js, lib/storage.ts, lib/quota-probe.ts, tests under test/).

Changes

Cohort / File(s) Summary
session supervisor config & ui
lib/codex-manager/settings-hub.ts, lib/config.ts, lib/schemas.ts, lib/ui/copy.ts
added codexCliSessionSupervisor backend toggle and experimental-menu action; added getCodexCliSessionSupervisor(pluginConfig) and default codexCliSessionSupervisor: false; schema accepts optional boolean; ui copy added (lib/ui/copy.ts). (see lib/config.ts:line ~1-40, lib/codex-manager/settings-hub.ts:line ~1-120).
supervisor implementation & routing
scripts/codex-supervisor.js, scripts/codex.js, scripts/codex-routing.js
new supervisor module implementing lock+heartbeat, account selection, quota-driven rotation, session binding scans, child restart escalation, prewarm logic, and test hooks; main entry now routes through runCodexSupervisorIfEnabled() and preserves startup sync semantics; added argument parsing helpers (findPrimaryCodexCommand, hasTopLevelHelpOrVersionFlag, splitCodexCommandArgs). (see scripts/codex-supervisor.js:line ~1-2356, scripts/codex.js:line ~1-80).
quota probe abort control
lib/quota-probe.ts
added signal?: AbortSignal to ProbeCodexQuotaOptions, abort checks before/after instruction fetch, per-request AbortController wiring with listener add/remove, and explicit AbortError handling/propagation. (see lib/quota-probe.ts:line ~1-120).
storage path equivalence & transactions
lib/storage.ts
added normalizeStorageComparisonPath and areEquivalentStoragePaths; refactored load/save to accept storagePath param; deterministic latestValidSnapshot sort with index tie-break; transaction handlers now run with state.active=false in finally; exportAccounts() blocks export when active transaction path is non-equivalent. watch lib/storage.ts:line ~40-140 for windows path normalization and transaction flow.
tests & fixtures
test/codex-supervisor.test.ts, test/codex-bin-wrapper.test.ts, test/storage.test.ts, test/quota-probe.test.ts, test/plugin-config.test.ts, test/settings-hub-utils.test.ts
added extensive supervisor tests and fixture stubs, expanded bin-wrapper tests to include supervisor forwarding and sync scenarios, added quota-probe abort tests, added plugin-config tests for env override, storage tests for path-drift and windows equivalence, and experimental settings hotkey tests. (see test/codex-supervisor.test.ts:line ~1-2340, test/storage.test.ts:line ~1-424).
package & scripts
package.json
added test:session-supervisor:smoke npm script covering supervisor-related test files.

Sequence Diagram(s)

sequenceDiagram
    participant caller as codex.js
    participant supervisor as codex-supervisor.js
    participant runtime as runtime/config
    participant lock as filesystem-lock
    participant storage as storage/accounts
    participant quota as quota-probe
    participant child as child-process
    participant codex as real-codex-binary

    caller->>supervisor: runCodexSupervisorIfEnabled(rawArgs)
    supervisor->>runtime: getCodexCliSessionSupervisor(pluginConfig)
    runtime-->>supervisor: true/false

    alt supervisor disabled
        supervisor-->>caller: null
        caller->>codex: forward args (startup sync)
    else supervisor enabled
        supervisor->>lock: acquire lock + heartbeat
        lock-->>supervisor: acquired
        supervisor->>storage: load account manager
        storage-->>supervisor: accounts

        alt non-interactive or not-eligible
            supervisor->>codex: forward args
            codex-->>supervisor: exit
        else interactive & eligible
            supervisor->>quota: probe current account (signal aware)
            quota-->>supervisor: snapshot
            alt rotate required
                supervisor->>child: signal SIGINT/SIGTERM/SIGKILL
                child-->>supervisor: exit
                supervisor->>storage: commit new account
                storage-->>supervisor: persisted
                supervisor->>codex: forward args with new account
            else prewarm
                supervisor->>quota: prewarm next account async
                supervisor->>codex: forward current account
            end
            codex-->>supervisor: exit
        end
        supervisor->>lock: release + cleanup
        supervisor-->>caller: exit code
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

notes (explicit risks to review)

  • concurrency & leak risks: review lock heartbeat and finally cleanup in scripts/codex-supervisor.js:line ~1-2356 and lib/storage.ts:line ~80-140 for race windows where state.active and storagePath diverge.
  • windows path handling: verify normalizeStorageComparisonPath() lowercases and normalizes separators and is used consistently in lib/storage.ts:line ~40-90. tests mutate process.platform (test/storage.test.ts:line ~1-200) but edge cases may remain.
  • quota abort semantics: ensure per-request listener cleanup covers all error paths in lib/quota-probe.ts:line ~1-120; tests added for aborts (test/quota-probe.test.ts:line ~1-80) but inspect listener detach on thrown sync errors.
  • fragile parsing: session binding scan in scripts/codex-supervisor.js:line ~500-800 parses .jsonl; add defensive parsing/logging for truncated or malformed lines.
  • missing regression coverage: add explicit tests for windows lock contention leaks, rapid quota rotation exhaustion, and concurrent exports during in-flight transaction mutation (test/* additions touch many areas but these specific regressions are not covered).
🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning pr description covers summary, what changed, validation results, and risk/rollback, but critical validation items incomplete: full npm test and npm run build not run; docs checklist not addressed. complete full npm test and npm run build; explicitly address docs checklist items (readme, docs/getting-started.md, docs/features.md, docs/reference/* if applicable); document why baseline npm test failure is pre-existing.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed Title follows conventional commits format with type 'feat', scope omitted, and summary in lowercase imperative within 72 chars, clearly describing the main feature addition.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch git-clean/05-session-supervisor-single
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch git-clean/05-session-supervisor-single

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ndycode ndycode force-pushed the git-clean/05-session-supervisor-single branch from 8b878c5 to 98c5185 Compare March 20, 2026 19:23
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/storage.ts (1)

2146-2155: ⚠️ Potential issue | 🔴 Critical

capture the flagged path with the transaction too.

lib/storage.ts:2146-2155 freezes the accounts file, but saveFlaggedAccountsUnlocked() still resolves its destination from getFlaggedAccountsPath() at call time in lib/storage.ts:2415-2419. if the handler switches storage roots mid-transaction, accounts are written back to the original path while flagged state goes to the new path, so rollback no longer protects the same storage pair. please thread a captured flagged path through this transaction and add a drift regression alongside test/storage.test.ts:1095-1154; the current combined-transaction cases in test/storage.test.ts:1202-1345 never exercise this split-write path. as per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

suggested fix
 async function saveFlaggedAccountsUnlocked(
 	storage: FlaggedAccountStorageV1,
+	storagePath = getFlaggedAccountsPath(),
 ): Promise<void> {
-	const path = getFlaggedAccountsPath();
+	const path = storagePath;
 	const markerPath = getIntentionalResetMarkerPath(path);
 	const uniqueSuffix = `${Date.now()}.${Math.random().toString(36).slice(2, 8)}`;
 	const tempPath = `${path}.${uniqueSuffix}.tmp`;
@@
 export async function withAccountAndFlaggedStorageTransaction<T>(
 	handler: (
@@
 ): Promise<T> {
 	return withStorageLock(async () => {
 		const storagePath = getStoragePath();
+		const flaggedStoragePath = join(dirname(storagePath), FLAGGED_ACCOUNTS_FILE_NAME);
 		const state = {
@@
 		const persist = async (
 			accountStorage: AccountStorageV3,
 			flaggedStorage: FlaggedAccountStorageV1,
 		): Promise<void> => {
 			const previousAccounts = cloneAccountStorageForPersistence(state.snapshot);
 			const nextAccounts = cloneAccountStorageForPersistence(accountStorage);
 			await saveAccountsUnlocked(nextAccounts, storagePath);
 			try {
-				await saveFlaggedAccountsUnlocked(flaggedStorage);
+				await saveFlaggedAccountsUnlocked(flaggedStorage, flaggedStoragePath);
 				state.snapshot = nextAccounts;
 			} catch (error) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 2146 - 2155, The persist transaction freezes
account file but calls saveFlaggedAccountsUnlocked which resolves its path at
call time; capture the flagged-path up-front (call getFlaggedAccountsPath()
inside persist into e.g. flaggedPathCaptured) and pass that path into
saveFlaggedAccountsUnlocked (and any helpers that resolve paths) so both
accounts and flagged writes target the same storage root; update
cloneAccountStorageForPersistence/state.snapshot usage unchanged but ensure
saveAccountsUnlocked and saveFlaggedAccountsUnlocked are called with the
captured paths, add a regression vitest near the existing storage tests that
simulates handler storage-root rotation mid-transaction to assert rollback still
restores both files, and update any affected queue logic to explicitly
retry/handle EBUSY and 429 per existing queue abstractions as noted in coding
guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/codex-manager/settings-hub.ts`:
- Around line 329-333: The codexCliSessionSupervisor toggle is currently listed
in BACKEND_TOGGLE_OPTIONS but not rendered by BACKEND_CATEGORY_OPTIONS, so
backendSettingsSnapshot() / buildBackendConfigPatch() will include it in backend
resets and silently clear it; fix by removing "codexCliSessionSupervisor" from
BACKEND_TOGGLE_OPTIONS (or alternatively add it to BACKEND_CATEGORY_OPTIONS and
ensure the backend UI renders it) so that the key is only in the shared backend
option set if the backend UI can actually display and preserve it—update the
array where BACKEND_TOGGLE_OPTIONS is defined and adjust any UI rendering logic
if you choose to move it into BACKEND_CATEGORY_OPTIONS.

In `@package.json`:
- Line 54: Update the smoke npm script "test:session-supervisor:smoke" to
include the storage regressions by adding test/storage.test.ts to the list so
the smoke run exercises the changed storage logic (the path-equivalence and
transaction/export safety code in lib/storage.ts around the path-drift and
windows-handling sections). Modify the package.json target that currently lists
tests like test/codex-supervisor.test.ts to also run test/storage.test.ts, then
commit the package.json change so CI's smoke job covers the storage changes.

In `@scripts/codex-supervisor.js`:
- Around line 551-585: Add a short in-source comment above the pendingRefresh
handler explaining that the read-then-write window in pendingRefresh (which
calls readSupervisorLockPayload and then writeSupervisorLockPayload using
lockPath, ownerId, acquiredAt, ttlMs) is not a critical TOCTOU because lock
acquisition uses exclusive create ('wx') so a concurrent supervisor cannot
silently take the same filename, and the code explicitly checks payload.ownerId
and calls stopWithError on mismatch; mention these two protections
(exclusive-create on acquisition and ownerId mismatch check) so future readers
understand the mitigation.
- Around line 34-35: The module-level maps snapshotProbeCache and
sessionRolloutPathById persist across process reloads and need a public reset
entrypoint; add and export a production-safe cache-reset function (e.g., export
function resetSupervisorCaches() or two exports resetSnapshotProbeCache() and
resetSessionRolloutPathCache()) that calls snapshotProbeCache.clear() and
sessionRolloutPathById.clear(), and document/ensure callers (such as the
supervisor runtime reload path) invoke it when reloading; keep the existing
__testOnly.clearAllProbeSnapshotCache and
__testOnly.clearSessionBindingPathCache behavior but provide this new public API
for embedded/long-running use.
- Around line 114-118: The parseNumberEnv helper currently signature
parseNumberEnv(name, fallback, min = 0) ignores an intended max bound; update
the signature to accept a fourth parameter (e.g., max = Infinity) and clamp the
parsed value between min and max (use Math.trunc then Math.max(min,
Math.min(max, value))). Modify parseNumberEnv accordingly and ensure callers
that pass a 4th arg (e.g., places that call parseNumberEnv(..., 0, 100) for
CODEX_AUTH_PREEMPTIVE_QUOTA_5H_REMAINING_PCT) behave correctly without further
changes.

In `@scripts/codex.js`:
- Around line 541-554: The branch after runCodexSupervisorIfEnabled that returns
supervisedExitCode without triggering autoSyncManagerActiveSelectionIfEnabled is
untested for the case supervisedExitCode === 0 and supervisorDidForward ===
false; add a deterministic vitest regression that simulates
runCodexSupervisorIfEnabled returning 0 while supervisorDidForward is false and
rawArgs is an interactive command (use isSupervisorInteractiveCommand to pick
args), then assert autoSyncManagerActiveSelectionIfEnabled was called once
(mock/spied) and the wrapper returns 0; reference runCodexSupervisorIfEnabled,
supervisedExitCode, supervisorDidForward, isSupervisorInteractiveCommand and
autoSyncManagerActiveSelectionIfEnabled in the test and avoid flaky timing by
stubbing dependencies instead of real concurrency or filesystem operations.

In `@test/codex-bin-wrapper.test.ts`:
- Around line 68-125: The fixture always hard-codes codexCliSessionSupervisor:
true in writeSupervisorRuntimeFixture which masks runtime env precedence; change
writeSupervisorRuntimeFixture to accept a boolean (e.g., enableSupervisor =
false) and write codexCliSessionSupervisor accordingly, then update the wrapper
test cases that previously relied on the fixture to call
writeSupervisorRuntimeFixture(fixtureRoot, false) and enable the supervisor only
via CODEX_AUTH_CLI_SESSION_SUPERVISOR=1 in the specific test that needs it so
tests exercise the env-path behavior rather than the hard-coded config.

In `@test/codex-supervisor.test.ts`:
- Around line 1326-1328: The busy-wait loop polling setImmediate to wait for
startedAccounts.size to reach 3 is flaky; replace it with a deterministic
deferred/promise that resolves when startedAccounts.size === 3 (and include a
short timeout fallback). Concretely, create a Promise (deferred) in the test and
have whatever code increments the startedAccounts Set resolve that promise as
soon as it observes startedAccounts.size === 3; then await that promise instead
of the for-loop. Ensure the promise also rejects or resolves after a reasonable
timeout to avoid hanging tests.
- Around line 9-26: The env cleanup list envKeys in
test/codex-supervisor.test.ts is missing the environment variable
CODEX_AUTH_CLI_SESSION_LOCK_TTL_MS which the tests set; add
"CODEX_AUTH_CLI_SESSION_LOCK_TTL_MS" to the envKeys const so the test harness
clears it between tests (update the envKeys array near the existing CODEX_AUTH_*
entries and re-run the test suite to ensure no state leakage).

In `@test/plugin-config.test.ts`:
- Around line 991-1008: The test suite misses the explicit environment opt-out
case: add a deterministic vitest test that sets
process.env.CODEX_AUTH_CLI_SESSION_SUPERVISOR = '0', calls
getCodexCliSessionSupervisor({ codexCliSessionSupervisor: true }) and asserts
the result is false, then cleans up the env var; this verifies the env override
in getCodexCliSessionSupervisor correctly wins over a true config value and
prevents regressions in the opt-out branch.

In `@test/storage.test.ts`:
- Around line 1008-1044: Update the test that simulates Windows platform to also
exercise separator normalization: when running inside
withAccountStorageTransaction (using setStoragePathDirect, exportAccounts,
saveAccounts, testStoragePath, exportPath) set the storage path to a
Windows-style path with backslashes (e.g., transform testStoragePath to use "\"
separators) in addition to uppercasing, so the export path comparison covers
both case-insensitivity and separator-insensitivity; ensure the modified test
still toggles process.platform to "win32" and restores it in finally, and assert
the exported accountId as before.

---

Outside diff comments:
In `@lib/storage.ts`:
- Around line 2146-2155: The persist transaction freezes account file but calls
saveFlaggedAccountsUnlocked which resolves its path at call time; capture the
flagged-path up-front (call getFlaggedAccountsPath() inside persist into e.g.
flaggedPathCaptured) and pass that path into saveFlaggedAccountsUnlocked (and
any helpers that resolve paths) so both accounts and flagged writes target the
same storage root; update cloneAccountStorageForPersistence/state.snapshot usage
unchanged but ensure saveAccountsUnlocked and saveFlaggedAccountsUnlocked are
called with the captured paths, add a regression vitest near the existing
storage tests that simulates handler storage-root rotation mid-transaction to
assert rollback still restores both files, and update any affected queue logic
to explicitly retry/handle EBUSY and 429 per existing queue abstractions as
noted in coding guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 10ee12b2-f7cf-4b80-b8cf-a2aeaefb3006

📥 Commits

Reviewing files that changed from the base of the PR and between 1be5e95 and 500314f.

📒 Files selected for processing (16)
  • lib/codex-manager/settings-hub.ts
  • lib/config.ts
  • lib/quota-probe.ts
  • lib/schemas.ts
  • lib/storage.ts
  • lib/ui/copy.ts
  • package.json
  • scripts/codex-routing.js
  • scripts/codex-supervisor.js
  • scripts/codex.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
  • test/plugin-config.test.ts
  • test/quota-probe.test.ts
  • test/settings-hub-utils.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/schemas.ts
  • lib/ui/copy.ts
  • lib/config.ts
  • lib/quota-probe.ts
  • lib/codex-manager/settings-hub.ts
  • lib/storage.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/quota-probe.test.ts
  • test/settings-hub-utils.test.ts
  • test/codex-bin-wrapper.test.ts
  • test/plugin-config.test.ts
  • test/storage.test.ts
  • test/codex-supervisor.test.ts
🪛 ast-grep (0.41.1)
test/storage.test.ts

[warning] 998-1002: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
storage path mismatch: transaction path is ${escapeRegExp( testStoragePath, )}, active path is ${escapeRegExp(secondaryStoragePath)},
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (18)
scripts/codex-supervisor.js (8)

37-60: lgtm - sleep with abort support.

proper cleanup of abort listeners and timer unref for process exit. the early return on signal?.aborted (line 39-41) prevents unnecessary timer allocation.


120-130: lgtm - env overrides read at call time.

getMaxAccountSelectionAttempts() and getMaxSessionRestarts() correctly read env vars at invocation rather than module load, honoring runtime overrides. this aligns with the commit message fix.


717-749: lgtm - windows transient lock failures handled.

EPERM and EBUSY during lock creation trigger retry after checking for stale locks (line 722-724). test coverage at test/codex-supervisor.test.ts:598-648 validates both error codes. the sleep at line 745 respects the abort signal.


1088-1095: lgtm - correct abort vs unavailable distinction.

line 1088-1093 correctly differentiates: the caller who aborted gets AbortError, while concurrent waiters on the same probe get QuotaProbeUnavailableError. this prevents aborting one caller from killing other waiters. test at test/codex-supervisor.test.ts:935-977 validates this behavior.


1729-1754: lgtm - platform-aware signal escalation.

non-windows gets SIGINT first (line 1742), then SIGTERM, then SIGKILL. windows skips SIGINT since it's not well-supported. tests at test/codex-supervisor.test.ts:448-468 (windows) and test/codex-supervisor.test.ts:470-487 (linux) cover both paths.


2150-2155: lgtm - prewarm selection cleanup in finally.

preparedResumeSelectionLink.cleanup() at line 2151 aborts the linked controller, and line 2152-2154 awaits the promise with .catch(() => null) to prevent unhandled rejection. this ensures the prewarm probe is cancelled when the session exits without rotation. test at test/codex-supervisor.test.ts:777-863 validates this.


2203-2241: lgtm - signal handler cleanup.

process.once at lines 2211-2212 and process.off at lines 2238-2239 ensure SIGINT/SIGTERM handlers are properly registered and removed. the finally block guarantees cleanup even on early returns.


1541-1583: lgtm - stream cleanup in finally.

extractSessionMeta properly cleans up lineReader and stream in the finally block (lines 1577-1579). the optional chaining handles cases where creation fails early.

test/codex-supervisor.test.ts (10)

31-48: lgtm - robust temp dir cleanup for windows.

removeDirectoryWithRetry handles ENOTEMPTY, EPERM, EBUSY with backoff. this is necessary on windows where file handles may linger briefly.


64-158: lgtm - FakeManager test double.

correctly implements the account management interface with stateful cooling down, index tracking, and candidate selection. saveToDisk is a no-op which is appropriate for unit tests.


224-254: lgtm - fake quota probe with abort support.

fetchCodexQuotaSnapshot at line 224-253 correctly handles abort signals and supports test hooks (onFetch, onFetchStart, waitForFetchByAccountId). this enables testing concurrent probe sharing and abort scenarios.


650-705: lgtm - lock serialization test.

"serializes concurrent callers behind the supervisor storage lock" uses deferreds to control entry/exit timing deterministically. the order assertion at line 704 validates that first-enterfirst-exitsecond-enter.


1086-1162: lgtm - token refresh race regression.

this test validates that commitPreparedSelection re-resolves the account from storage at commit time (lines 1147-1160), ensuring stale prewarm snapshots don't override a refreshed token. this is a critical regression case per coding guidelines.


448-487: lgtm - signal escalation tests.

lines 448-468 test abort-interrupted restart waits, lines 470-487 test the SIGINTSIGTERMSIGKILL sequence on linux. both use vi.useFakeTimers() for deterministic timing.


1714-1769: lgtm - monitor loop failure test.

validates that exceptions from loadCurrentState during the monitor loop (line 1756-1758) result in exit code 1 and a user-facing error message on stderr. important for debugging production issues.


257-272: lgtm - afterEach cleanup.

properly restores env vars, clears supervisor caches, resets timers, and removes temp directories. reverse iteration at line 269 handles nested directory cleanup correctly.


274-299: lgtm - boundary test for session metadata scan.

tests the 200-line scan limit behavior: line 199 metadata is found (lines 275-299), line 201 metadata is missed (lines 301-321). deterministic test data with explicit assertions.


1561-1598: regression test for storage path mismatch already exists.

test/storage.test.ts:964-1005 validates the exact behavioral change—"fails fast when export is called from an active transaction after the storage path changes" explicitly tests that exportAccounts throws with "storage path mismatch: transaction path is X, active path is Y" when the path diverges during a transaction.

@ndycode ndycode added the needs-human-review Flagged by automated PR quality screening; maintainer review required. label Mar 20, 2026
@ndycode ndycode changed the title Add an experimental Codex session supervisor feat: add experimental Codex session supervisor Mar 20, 2026
@ndycode
Copy link
Owner Author

ndycode commented Mar 21, 2026

@coderabbitai full review

@coderabbitai summary

@ndycode
Copy link
Owner Author

ndycode commented Mar 21, 2026

@greptileai please review this PR again and refresh the summary for the latest branch state.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 21, 2026

@ndycode — on it. triggering a full review pass and regenerating the summary for the latest branch state now.

✅ Actions performed

Summary regeneration triggered.

@ndycode
Copy link
Owner Author

ndycode commented Mar 21, 2026

@greptileai please review this PR again and refresh the summary for the latest branch state.

@ndycode
Copy link
Owner Author

ndycode commented Mar 21, 2026

Superseded by #206 and #207.

I rebuilt this from fresh main worktrees to make the review surface smaller:

Closing this one so review stays on the clean replacements.

@ndycode ndycode closed this Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human-review Flagged by automated PR quality screening; maintainer review required.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant